Easy-ICD - a tool to predict ICD-10 gastro codes from Swedish discharge summaries


Andrius Budrionis a,c, Taridzo Chomutare a,c, Therese Olsen Svenning a, Hercules Dalianis a,b

Introduction
Studies conducted in Norway and Sweden reveal that manual ICD-10 diagnosis coding is both time-consuming and prone to errors, with reported error rates in Norway reaching up to 20-30 percent (Riksrevisjonen 2016), and in Sweden main diagnosis was missing in 0.9% hospital and 10% primary care 10% encounters.

The digital storage of vast amounts of unstructured data in Electronic Patient Records systems calls for machine learning algorithms to assist clinical coders in their daily work. The effectiveness of Natural Language Processing and Artificial Intelligence methods in various text processing tasks, including text classification, topic modeling, machine translation, and text summarization, has been well-documented and these algorithms are already in use in various fields. These factors have paved the way for the development of Easy-ICD, a tool that can suggest ICD-10 codes for discharge summaries based on free text contents.

Methods
The core of Easy-ICD is a machine learning model trained to predict ICD-10 diagnosis codes. This model was developed by continuously pretraining Swedish general language model KB-BERT on 17.8 GB of deidentified Swedish clinical text. This process resulted in a model called SweDeClinBERT (Vakili et al. 2022) that was supplemented by classification layer and trained on 317.971 Swedish discharge summaries from 113.174 patients (Lamproudis et al 2023) for the ICD-10 diagnosis code prediction task. The final model was evaluated for the most common (top 80%) and all code scenarios.

Results
The performance of SweDeClinBERT tuned for ICD-10 diagnosis code prediction task is listed in Table 1 below.

Table 1. Predictive performance of SweDeClinBERT tuned for ICD-10 code prediction task




To make this model available for the end users, a web application was developed and deployed online (https://easy-icd.ehealthresearch.no/). The user interface of Easy-ICD web application is depicted in Figure 1.

Figure 1. User interface of Easy-ICD application




A user study using the Easy-ICD application is currently in progress. It studies how this tool affects clinical coding practices in terms of code quality and time consumption.

Conclusions
This work demonstrates the feasibility of developing a tool capable of suggesting ICD-10 diagnosis codes from clinical narrative with relatively high accuracy (Table 1). While we have to acknowledge that the current work is limited to Swedish language and K-codes in ICD-10 hierarchy, the findings confirm that addressing these limitations is solely dependent on access to relevant datasets. The methodology of this work can be reused and scaled to cover all ICD-10 codes and can also be adapted to support other languages. Work to reproduce these findings in Norwegian is currently ongoing.


References
  1. Lamproudis, A., Olsen Svenning T., Torsvik T., Chomutare T., Budrionis A, Dinh Ngo P., Vakili T. and H. Dalianis. 2023. Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding. In the Proceedings of AMIA 2023, Annual Symposium, November 11-15. New Orleans, LA, USA.
  2. Riksrevisjonen. 2016. "Undersøkelse av medisinsk kodepraksis i helseforetakene." 2017 2016.
  3. https://www.riksrevisjonen.no/rapporter-mappe/no-2016-2017/medisinsk-kodep raksis-i-helseforetakene/.

a Norwegian Centre for E-health Research, Tromsø, Norway
b Department of Computer and Systems Sciences (DSV), Stockholm University, Kista, Sweden
c Faculty of Science and Technology, UiT The Arctic University of Norway, Tromsø, Norway

Original Version in PDF